A Parallel Data Mining Architecture for Massive Data Sets
نویسندگان
چکیده
This paper discusses a parallel data mining architecture which provides the capability to mine massive data sets highly efficiently, scanning millions of rows of data per second. In this architecture the mining process is divided into two distinct components. A parallel server, Compaq’s Data Mining Server (DMS), provides a set of data mining primitives which are utilized by a data mining client, Syllogic’s DMT/MP, which implements the actual data mining algorithms. The parallel architecture and the primitives used to operate on the data will be discussed, and the mining algorithms’ use of these primitives. Performance figures will be presented for both the primitives and the high level mining algorithms.
منابع مشابه
TeraScope: distributed visual data mining of terascale data sets over photonic networks
TeraScope is a framework and a suite of tools for interactively browsing and visualizing large terascale data sets. Unique to TeraScope is its utilization of the Optiputer paradigm to treat distributed computer clusters as a single giant computer, where the dedicated optical networks that connect the clusters serve as the computer’s system bus. TeraScope explores one aspect of the Optiputer arc...
متن کاملMafia: Eecient and Scalable Subspace Clustering for Very Large Data Sets Center for Parallel and Distributed Computing Mafia: Eecient and Scalable Subspace Clustering for Very Large Data Sets
Clustering techniques are used in database mining for nding interesting patterns in high dimensional data. These are useful in various applications of knowledge discovery in databases. Some challenges in clustering for large data sets in terms of scalability, data distribution, understanding end-results, and sensitivity to input order, have received attention in the recent past. Recent approach...
متن کاملA parallel method for computing rough set approximations
Massive data mining and knowledge discovery present a tremendous challenge with the data volume growing at an unprecedented rate. Rough set theory has been successfully applied in data mining. The lower and upper approximations are basic concepts in rough set theory. The effective computation of approximations is vital for improving the performance of data mining or other related tasks. The rec...
متن کاملParallel Wavelet Transform for Spatio-temporal Outlier Detection in Large Meteorological Data
This paper describes a state-of-the-art parallel data mining solution that employs wavelet analysis for scalable outlier detection in large complex spatio-temporal data. The algorithm has been implemented on multiprocessor architecture and evaluated on real-world meteorological data. Our solution on high-performance architecture can process massive and complex spatial data at reasonable time an...
متن کاملFast Parallel Mining of Frequent Itemsets
Association rule mining has become an essential data mining technique in various fields and the massive growth of the available data demands more and more computational power. To address this issue, it is necessary to study parallel implementations of such algorithms. In this paper, we propose a parallel approach to the Frequent Pattern Tree (FP-Tree) algorithm, which is a fast and popular tree...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999